In this project, we are using a dataset of songs on the music streaming app Spotify.
The dataset contains songs on Spotify across multiple genres, and we will be performing several analyses to this dataset, such as basic descriptive and bivariate statistics, Principal Component Analysis, decision trees, regression, and clustering.
Here is the link to the original dataset
First, we import the data to R and make sure R is reading the data properly.
# importing relevant libraries to perform cleaning on the data
library(tidyverse)
library(janitor)
setwd("~/Documents/class/stats-final-project/")
# importing the data and cleaning the names into a snake_case format.
raw_data <- read.csv("dataset.csv") %>% clean_names()
The dataset has 114,000 rows and 21 columns/variables.
It has the following scores (numerical variables): * popularity: The popularity of a track is a value between 0 and 100, with 100 being the most popular. - duration_ms: The track length in milliseconds - danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable - energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale - loudness: The overall loudness of a track in decibels (dB) - speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. - acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic - instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content - liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live - valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry) - tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration
The dataset also has the following categorical variables: - explicit: Whether or not the track has explicit lyrics (true = yes it does; false = no it does not OR unknown) - mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0 - key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1 - track_genre: The genre in which the track belongs
It also has the following columns that describe the songs: - track_id: The Spotify ID for the track - artists: The artists’ names who performed the track. If there is more than one artist, they are separated by a ; - album_name: The album name in which the track appears - track_name: Name of the track
Now we are performing several checks on the data
#dim(raw_data)
#names(raw_data)
#sapply(raw_data, class)
We’ll perform the following transforms to the data in order to prepare it for our analysis:
Since some of the categorical variables - mode, key, and time_signature - are currently codified as numerical, we will do the following: 1. key: We will be converting them from numbers 0 to 11 to the letter value of the key (C = 0, C# = 1, etc.) 2. mode: Instead of 0 for minor and 1 for major, we’ll convert them to “major” and “minor”. 3. time_signature: We’ll convert into characters instead of numeric.
We are also adding 2 more variables, which are: 1. multiple_artists: If there are multiple artists performing the track, the artists column will contain all artists separated by a semicolon (;). We’ll add a true value in this column if there are multiple artists, and false if a single artist. 2. tempo_cat: this is a categorical variable based on the tempo column. We’ll use the beats per minute to determine which tempo marking it fits in. This will be an ordinal variable, with the levels defined.
We are also performing several filters to scope our analysis: 1.
Filter to songs that are done by popular artists. This is done by
finding the artists that have 20 songs or more and filtering to just the
songs by those artists. 2. We also scope the analysis to just songs that
are less than 10 minutes. 3. We’ll also remove the duplicated songs.
This is because some songs are listed in albums or single versions. This
is done by removing songs that have the same variable in the track_name
and artists columns. 4. We also sample the data to just 3,000
rows/songs. This is done by random sampling using the
sample_n() function.
Finally, we’ll just select the columns that are relevant to us in our analysis, and remove the descriptive columns, track_id, artists, album_name, and track_name.
# creating a list of keys from C to B
key_alpha <- c('C','C#/Db','D','D#/Eb','E','F','F#/Gb','G','G#/Ab','A','A#/Bb','B')
# creating a new df for mapping keys
key_map <- data.frame(key = c(0:11),
key_alpha = key_alpha)
data <- raw_data %>%
full_join(key_map, "key") %>%
mutate(mode = str_replace(as.character(mode), "0", "minor"),
mode = str_replace(as.character(mode), "1", "major"),
time_signature = as.character(time_signature),
# converting keys to alphabet
key = key_alpha,
# adding a column for whether the artist is one or multiple. True for Multiple, false for single
multiple_artists = grepl(";", artists)
) %>%
select(-1, -22) %>%
# removing duplicates in songs, because some tracks are in multiple albums
distinct(track_name, artists, .keep_all = TRUE)
# finding the popular artists, with more than 20 songs listed.
popular_artists <- data %>% group_by(artists) %>%
summarize(count = n()) %>%
filter(count >= 20)
filtered <- data %>% filter(artists %in% popular_artists$artists) %>%
filter(duration_ms <= 600000) %>%
#removing the track_id, title, artists, albums because it's not needed.
mutate(
# adding an ordinal variable for tempo
tempo_cat = cut(tempo,
breaks=c(0, 20, 40, 60, 66, 76, 108, 120, 168, 176, 200, 1000),
labels=c('Larghissimo','Grave','Lento/Largo','Larghetto','Adagio','Andante','Moderato','Allegro','Vivace','Presto','Prestissimo'))
)
# random sampling the data to just 3000 songs
set.seed(1)
dd <- filtered %>% sample_n(3000) %>% select(5:22)
# attaching the column names
attach(dd)
# for one-time exporting for other analysis
# dd %>% write_csv("cleaneddata.csv")
#exporting dataset with titles for clustering analysis
#filtered %>% relocate(artists, .after = last_col()) %>%
# relocate(album_name, .after = last_col()) %>%
# relocate(track_name, .after = last_col()) %>% select(2:22) %>% write_csv("cleaneddata-withtitles.csv")
n<-dim(dd)[1]
K<-dim(dd)[2]
descriptiva<-function(X, nom){
if (!(is.numeric(X) || class(X)=="Date")){
frecs<-table(as.factor(X), useNA="ifany")
proportions<-frecs/n
#ojo, decidir si calcular porcentages con o sin missing values
pie(frecs, cex=0.6, main=paste("Pie of", nom))
barplot(frecs, las=3, cex.names=0.7, main=paste("Barplot of", nom), col=listOfColors)
print(paste("Number of modalities: ", length(frecs)))
print("Frequency table")
print(frecs)
print("Relative frequency table (proportions)")
print(proportions)
print("Frequency table sorted")
print(sort(frecs, decreasing=TRUE))
print("Relative frequency table (proportions) sorted")
print(sort(proportions, decreasing=TRUE))
}else{
if(class(X)=="Date"){
print(summary(X))
print(sd(X))
#decide breaks: weeks, months, quarters...
hist(X,breaks="weeks")
}else{
hist(X, main=paste("Histogram of", nom))
boxplot(X, horizontal=TRUE, main=paste("Boxplot of",nom))
print("Extended Summary Statistics")
print(summary(X))
print(paste("sd: ", sd(X, na.rm=TRUE)))
print(paste("vc: ", sd(X, na.rm=TRUE)/mean(X, na.rm=TRUE)))
}
}
}
dataset<-dd
actives<-c(1:K)
colDate<-1
if (dataset=="platjaDaro")
{dd[,colDate]<-as.Date(dd[, colDate], format="%d/%m/%y %h:%m:%s")
actives<-c(3:44)
}
Warning: the condition has length > 1 and only the first element will be used
listOfColors<-rainbow(39)
par(ask=FALSE)
for(k in actives){
print(paste("variable ", k, ":", names(dd)[k] ))
descriptiva(dd[,k], names(dd)[k])
}
[1] "variable 1 : popularity"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 19.00 30.00 32.78 46.00 97.00
[1] "sd: 19.071854500815"
[1] "vc: 0.581754585688306"
[1] "variable 2 : duration_ms"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
28946 167156 215889 224103 268796 594533
[1] "sd: 86933.4740717619"
[1] "vc: 0.387917069526423"
[1] "variable 3 : explicit"
[1] "Number of modalities: 2"
[1] "Frequency table"
False True
2812 188
[1] "Relative frequency table (proportions)"
False True
0.93733333 0.06266667
[1] "Frequency table sorted"
False True
2812 188
[1] "Relative frequency table (proportions) sorted"
False True
0.93733333 0.06266667
[1] "variable 4 : danceability"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.4270 0.5480 0.5410 0.6653 0.9750
[1] "sd: 0.176265638860442"
[1] "vc: 0.325833140160227"
[1] "variable 5 : energy"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000242 0.423000 0.666500 0.624997 0.861250 0.999000
[1] "sd: 0.265941258309461"
[1] "vc: 0.42550798946771"
[1] "variable 6 : key"
[1] "Number of modalities: 12"
[1] "Frequency table"
A A#/Bb B C C#/Db D D#/Eb E F F#/Gb G G#/Ab
326 187 221 366 250 346 90 269 252 160 383 150
[1] "Relative frequency table (proportions)"
A A#/Bb B C C#/Db D D#/Eb E F F#/Gb
0.10866667 0.06233333 0.07366667 0.12200000 0.08333333 0.11533333 0.03000000 0.08966667 0.08400000 0.05333333
G G#/Ab
0.12766667 0.05000000
[1] "Frequency table sorted"
G C D A E F C#/Db B A#/Bb F#/Gb G#/Ab D#/Eb
383 366 346 326 269 252 250 221 187 160 150 90
[1] "Relative frequency table (proportions) sorted"
G C D A E F C#/Db B A#/Bb F#/Gb
0.12766667 0.12200000 0.11533333 0.10866667 0.08966667 0.08400000 0.08333333 0.07366667 0.06233333 0.05333333
G#/Ab D#/Eb
0.05000000 0.03000000
[1] "variable 7 : loudness"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
-42.631 -11.155 -7.498 -8.888 -5.181 0.377
[1] "sd: 5.32258438263559"
[1] "vc: -0.598842498009395"
[1] "variable 8 : mode"
[1] "Number of modalities: 2"
[1] "Frequency table"
major minor
2022 978
[1] "Relative frequency table (proportions)"
major minor
0.674 0.326
[1] "Frequency table sorted"
major minor
2022 978
[1] "Relative frequency table (proportions) sorted"
major minor
0.674 0.326
[1] "variable 9 : speechiness"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03470 0.04690 0.07862 0.07490 0.96200
[1] "sd: 0.103964236837045"
[1] "vc: 1.32239625374972"
[1] "variable 10 : acousticness"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000001 0.013300 0.217000 0.348478 0.678250 0.996000
[1] "sd: 0.349889758582825"
[1] "vc: 1.00405207937953"
[1] "variable 11 : instrumentalness"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000889 0.1887061 0.1462500 1.0000000
[1] "sd: 0.33859412273214"
[1] "vc: 1.7942934790234"
[1] "variable 12 : liveness"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0112 0.1010 0.1410 0.2379 0.3070 0.9920
[1] "sd: 0.216531534653923"
[1] "vc: 0.910165401744503"
[1] "variable 13 : valence"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2500 0.4730 0.4771 0.6933 0.9850
[1] "sd: 0.267153566864831"
[1] "vc: 0.559899974951364"
[1] "variable 14 : tempo"
[1] "Extended Summary Statistics"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 100.0 121.9 121.9 139.7 220.0
[1] "sd: 29.448628745521"
[1] "vc: 0.241613772469231"
[1] "variable 15 : time_signature"
[1] "Number of modalities: 5"
[1] "Frequency table"
0 1 3 4 5
4 30 271 2636 59
[1] "Relative frequency table (proportions)"
0 1 3 4 5
0.001333333 0.010000000 0.090333333 0.878666667 0.019666667
[1] "Frequency table sorted"
4 3 5 1 0
2636 271 59 30 4
[1] "Relative frequency table (proportions) sorted"
4 3 5 1 0
0.878666667 0.090333333 0.019666667 0.010000000 0.001333333
[1] "variable 16 : track_genre"
[1] "Number of modalities: 100"
[1] "Frequency table"
acoustic afrobeat alt-rock alternative ambient anime
23 41 46 8 41 32
black-metal bluegrass blues brazil breakbeat british
17 41 2 19 36 71
cantopop chicago-house children chill classical club
51 67 98 6 22 24
comedy country dance dancehall death-metal detroit-techno
32 5 2 14 21 79
disco disney drum-and-bass edm electro electronic
4 43 4 3 11 19
emo folk forro garage german gospel
30 18 50 33 24 14
goth grindcore groove grunge guitar happy
47 49 34 44 31 30
hard-rock hardcore heavy-metal hip-hop honky-tonk idm
45 12 103 8 109 61
indian indie indie-pop industrial iranian j-dance
9 1 2 47 44 26
j-idol j-pop j-rock jazz k-pop kids
110 12 16 2 37 84
latin latino malay mandopop metal metalcore
1 3 36 23 14 30
minimal-techno mpb new-age opera pagode party
6 16 63 20 59 20
piano pop pop-film power-pop progressive-house psych-rock
28 1 3 28 4 37
punk punk-rock r-n-b rock-n-roll rockabilly romance
5 17 11 35 29 53
salsa samba sertanejo show-tunes singer-songwriter ska
21 24 44 18 8 40
sleep spanish study swedish synth-pop tango
34 11 74 7 18 45
trance trip-hop turkish world-music
10 27 7 56
[1] "Relative frequency table (proportions)"
acoustic afrobeat alt-rock alternative ambient anime
0.0076666667 0.0136666667 0.0153333333 0.0026666667 0.0136666667 0.0106666667
black-metal bluegrass blues brazil breakbeat british
0.0056666667 0.0136666667 0.0006666667 0.0063333333 0.0120000000 0.0236666667
cantopop chicago-house children chill classical club
0.0170000000 0.0223333333 0.0326666667 0.0020000000 0.0073333333 0.0080000000
comedy country dance dancehall death-metal detroit-techno
0.0106666667 0.0016666667 0.0006666667 0.0046666667 0.0070000000 0.0263333333
disco disney drum-and-bass edm electro electronic
0.0013333333 0.0143333333 0.0013333333 0.0010000000 0.0036666667 0.0063333333
emo folk forro garage german gospel
0.0100000000 0.0060000000 0.0166666667 0.0110000000 0.0080000000 0.0046666667
goth grindcore groove grunge guitar happy
0.0156666667 0.0163333333 0.0113333333 0.0146666667 0.0103333333 0.0100000000
hard-rock hardcore heavy-metal hip-hop honky-tonk idm
0.0150000000 0.0040000000 0.0343333333 0.0026666667 0.0363333333 0.0203333333
indian indie indie-pop industrial iranian j-dance
0.0030000000 0.0003333333 0.0006666667 0.0156666667 0.0146666667 0.0086666667
j-idol j-pop j-rock jazz k-pop kids
0.0366666667 0.0040000000 0.0053333333 0.0006666667 0.0123333333 0.0280000000
latin latino malay mandopop metal metalcore
0.0003333333 0.0010000000 0.0120000000 0.0076666667 0.0046666667 0.0100000000
minimal-techno mpb new-age opera pagode party
0.0020000000 0.0053333333 0.0210000000 0.0066666667 0.0196666667 0.0066666667
piano pop pop-film power-pop progressive-house psych-rock
0.0093333333 0.0003333333 0.0010000000 0.0093333333 0.0013333333 0.0123333333
punk punk-rock r-n-b rock-n-roll rockabilly romance
0.0016666667 0.0056666667 0.0036666667 0.0116666667 0.0096666667 0.0176666667
salsa samba sertanejo show-tunes singer-songwriter ska
0.0070000000 0.0080000000 0.0146666667 0.0060000000 0.0026666667 0.0133333333
sleep spanish study swedish synth-pop tango
0.0113333333 0.0036666667 0.0246666667 0.0023333333 0.0060000000 0.0150000000
trance trip-hop turkish world-music
0.0033333333 0.0090000000 0.0023333333 0.0186666667
[1] "Frequency table sorted"
j-idol honky-tonk heavy-metal children kids detroit-techno
110 109 103 98 84 79
study british chicago-house new-age idm pagode
74 71 67 63 61 59
world-music romance cantopop forro grindcore goth
56 53 51 50 49 47
industrial alt-rock hard-rock tango grunge iranian
47 46 45 45 44 44
sertanejo disney afrobeat ambient bluegrass ska
44 43 41 41 41 40
k-pop psych-rock breakbeat malay rock-n-roll groove
37 37 36 36 35 34
sleep garage anime comedy guitar emo
34 33 32 32 31 30
happy metalcore rockabilly piano power-pop trip-hop
30 30 29 28 28 27
j-dance club german samba acoustic mandopop
26 24 24 24 23 23
classical death-metal salsa opera party brazil
22 21 21 20 20 19
electronic folk show-tunes synth-pop black-metal punk-rock
19 18 18 18 17 17
j-rock mpb dancehall gospel metal hardcore
16 16 14 14 14 12
j-pop electro r-n-b spanish trance indian
12 11 11 11 10 9
alternative hip-hop singer-songwriter swedish turkish chill
8 8 8 7 7 6
minimal-techno country punk disco drum-and-bass progressive-house
6 5 5 4 4 4
edm latino pop-film blues dance indie-pop
3 3 3 2 2 2
jazz indie latin pop
2 1 1 1
[1] "Relative frequency table (proportions) sorted"
j-idol honky-tonk heavy-metal children kids detroit-techno
0.0366666667 0.0363333333 0.0343333333 0.0326666667 0.0280000000 0.0263333333
study british chicago-house new-age idm pagode
0.0246666667 0.0236666667 0.0223333333 0.0210000000 0.0203333333 0.0196666667
world-music romance cantopop forro grindcore goth
0.0186666667 0.0176666667 0.0170000000 0.0166666667 0.0163333333 0.0156666667
industrial alt-rock hard-rock tango grunge iranian
0.0156666667 0.0153333333 0.0150000000 0.0150000000 0.0146666667 0.0146666667
sertanejo disney afrobeat ambient bluegrass ska
0.0146666667 0.0143333333 0.0136666667 0.0136666667 0.0136666667 0.0133333333
k-pop psych-rock breakbeat malay rock-n-roll groove
0.0123333333 0.0123333333 0.0120000000 0.0120000000 0.0116666667 0.0113333333
sleep garage anime comedy guitar emo
0.0113333333 0.0110000000 0.0106666667 0.0106666667 0.0103333333 0.0100000000
happy metalcore rockabilly piano power-pop trip-hop
0.0100000000 0.0100000000 0.0096666667 0.0093333333 0.0093333333 0.0090000000
j-dance club german samba acoustic mandopop
0.0086666667 0.0080000000 0.0080000000 0.0080000000 0.0076666667 0.0076666667
classical death-metal salsa opera party brazil
0.0073333333 0.0070000000 0.0070000000 0.0066666667 0.0066666667 0.0063333333
electronic folk show-tunes synth-pop black-metal punk-rock
0.0063333333 0.0060000000 0.0060000000 0.0060000000 0.0056666667 0.0056666667
j-rock mpb dancehall gospel metal hardcore
0.0053333333 0.0053333333 0.0046666667 0.0046666667 0.0046666667 0.0040000000
j-pop electro r-n-b spanish trance indian
0.0040000000 0.0036666667 0.0036666667 0.0036666667 0.0033333333 0.0030000000
alternative hip-hop singer-songwriter swedish turkish chill
0.0026666667 0.0026666667 0.0026666667 0.0023333333 0.0023333333 0.0020000000
minimal-techno country punk disco drum-and-bass progressive-house
0.0020000000 0.0016666667 0.0016666667 0.0013333333 0.0013333333 0.0013333333
edm latino pop-film blues dance indie-pop
0.0010000000 0.0010000000 0.0010000000 0.0006666667 0.0006666667 0.0006666667
jazz indie latin pop
0.0006666667 0.0003333333 0.0003333333 0.0003333333
[1] "variable 17 : multiple_artists"
[1] "Number of modalities: 2"
[1] "Frequency table"
FALSE TRUE
2927 73
[1] "Relative frequency table (proportions)"
FALSE TRUE
0.97566667 0.02433333
[1] "Frequency table sorted"
FALSE TRUE
2927 73
[1] "Relative frequency table (proportions) sorted"
FALSE TRUE
0.97566667 0.02433333
[1] "variable 18 : tempo_cat"
[1] "Number of modalities: 12"
[1] "Frequency table"
Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace
0 0 10 23 96 858 422 1335 119
Presto Prestissimo <NA>
116 17 4
[1] "Relative frequency table (proportions)"
Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace
0.000000000 0.000000000 0.003333333 0.007666667 0.032000000 0.286000000 0.140666667 0.445000000 0.039666667
Presto Prestissimo <NA>
0.038666667 0.005666667 0.001333333
[1] "Frequency table sorted"
Allegro Andante Moderato Vivace Presto Adagio Larghetto Prestissimo Lento/Largo
1335 858 422 119 116 96 23 17 10
<NA> Larghissimo Grave
4 0 0
[1] "Relative frequency table (proportions) sorted"
Allegro Andante Moderato Vivace Presto Adagio Larghetto Prestissimo Lento/Largo
0.445000000 0.286000000 0.140666667 0.039666667 0.038666667 0.032000000 0.007666667 0.005666667 0.003333333
<NA> Larghissimo Grave
0.001333333 0.000000000 0.000000000
par(ask=FALSE)
#per exportar figures d'R per programa
#dev.off()
#png(file=mypath,width = 950, height = 800, units = "px")
#dev.off()
After seeing the basic descriptive statistics of the data, we’ll do a bivariate statistics analysis. The purpose is to find relationships between: 1. Categorical vs categorical variables 2. Categorical vs numerical variables 3. Numerical vs numerical variables
We examine the relationship between tempo marking and mode.
library(ggplot2)
# stacked bar chart
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = tempo_cat,
fill = mode)) +
geom_bar(position = "fill")
From the above bar charts, we see that while there are more songs in the major key, some tempo markings have a higher proportion of minor songs than others. Larghetto and Vivace songs are the two tempo markings that have the highest proportion of minor songs, which is interesting because Larghetto is on the slower end, and Vivace is on the faster end.
Next, we can also examine the relationship between mode and explicitness.
# stacked bar chart
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = explicit,
fill = mode)) +
geom_bar(position = "fill")
From the plots above: there aren’t many songs that are explicit, and it is hard to tell the relationship. The proportion of minor songs for songs that are explicit is slightly higher than major songs. However, it’s a very minimal difference.
We can also examine the relationship between track_genre and mode.
# stacked bar chart
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "stack") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "dodge") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = mode)) +
geom_bar(position = "fill") + scale_x_discrete(guide = guide_axis(angle = 90)) +
theme(legend.key.size = unit(0.5, 'cm'), #change legend key size
legend.key.height = unit(0.5, 'cm'), #change legend key height
legend.key.width = unit(0.5, 'cm'), #change legend key width
legend.title = element_text(size=5), #change legend title font size
legend.text = element_text(size=5)) +
theme(text = element_text(size = 7))
# getting the results of the proportion bar chart in a table
dd %>% group_by(track_genre, mode) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(mode == "minor") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'track_genre'. You can override using the `.groups` argument.
NA
We can see there are some genres that clearly stand out that have majority minor songs. All latin songs are in the minor key. Synth-pop, turkish, trance, dancehall, romance, spanish, anime and hiphop songs are also among the top 10 genres with a high proportion of minor songs.
So, there seem to be a relationship between genre and mode.
# stacked bar chart
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "stack") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "dodge") + scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = track_genre,
fill = explicit)) +
geom_bar(position = "fill") + scale_x_discrete(guide = guide_axis(angle = 90)) +
theme(legend.key.size = unit(0.5, 'cm'), #change legend key size
legend.key.height = unit(0.5, 'cm'), #change legend key height
legend.key.width = unit(0.5, 'cm'), #change legend key width
legend.title = element_text(size=5), #change legend title font size
legend.text = element_text(size=5)) +
theme(text = element_text(size = 7))
# getting the results of the proportion bar chart in a table
dd %>% group_by(track_genre, explicit) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(explicit == "True") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'track_genre'. You can override using the `.groups` argument.
Explicit and genre also seems to be related, as songs that are explicit tend to be from some genres. Latino songs are 100% explicit. Comedy, country, dance and some of the metal genres also tend to contain swear words.
Next we can examine the relationship between key and time signature.
# stacked bar chart
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "stack")
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "dodge")
ggplot(dd,
aes(x = key,
fill = mode)) +
geom_bar(position = "fill")
# getting the results of the proportion bar chart in a table
dd %>% group_by(key, mode) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
filter(mode == "minor") %>%
arrange(desc(freq))
`summarise()` has grouped output by 'key'. You can override using the `.groups` argument.
The key of B has the highest proportion of minor songs. F#/Gb, E and A#/Bb also have a relatively higher percentage of minor songs.
We can also perform analysis on categorical vs numerical variables by using charts such as multiple boxplots to plot the distribution of one numerical variable given another categorical variable.
We have used the code by Dr. Karina Gibert to do an overview of the
variables, and below are the interesting plots in
ggplot()
First, the code below creates functions to test numerical and qualitative variables.
#Calcula els valor test de la variable Xnum per totes les modalitats del factor P
ValorTestXnum <- function(Xnum,P){
#freq dis of fac
nk <- as.vector(table(P));
n <- sum(nk);
#mitjanes x grups
xk <- tapply(Xnum,P,mean);
#valors test
txk <- (xk-mean(Xnum))/(sd(Xnum)*sqrt((n-nk)/(n*nk)));
#p-values
pxk <- pt(txk,n-1,lower.tail=F);
for(c in 1:length(levels(as.factor(P)))){if (pxk[c]>0.5){pxk[c]<-1-pxk[c]}}
return (pxk)
}
ValorTestXquali <- function(P,Xquali){
taula <- table(P,Xquali);
n <- sum(taula);
pk <- apply(taula,1,sum)/n;
pj <- apply(taula,2,sum)/n;
pf <- taula/(n*pk);
pjm <- matrix(data=pj,nrow=dim(pf)[1],ncol=dim(pf)[2], byrow=TRUE);
dpf <- pf - pjm;
dvt <- sqrt(((1-pk)/(n*pk))%*%t(pj*(1-pj)));
#i hi ha divisions iguals a 0 dona NA i no funciona
zkj <- dpf
zkj[dpf!=0]<-dpf[dpf!=0]/dvt[dpf!=0];
pzkj <- pnorm(zkj,lower.tail=F);
for(c in 1:length(levels(as.factor(P)))){for (s in 1:length(levels(Xquali))){if (pzkj[c,s]> 0.5){pzkj[c,s]<-1- pzkj[c,s]}}}
return (list(rowpf=pf,vtest=zkj,pval=pzkj))
}
Let’s run the profiling script for the mode variable
#Data is referred to as "dades" in the following code
dades<-dd
K<-dim(dades)[2]
#par(ask=TRUE)
#P must contain the class variable
P<-dd$mode
nameP<-"mode"
nc<-length(levels(as.factor(P)))
pvalk <- matrix(data=0,nrow=nc,ncol=K, dimnames=list(levels(P),names(dades)))
nameP<-"mode"
n<-dim(dades)[1]
for(k in 1:K){
if (is.numeric(dades[,k])){
print(paste("Analysis by class of the Variable:", names(dades)[k]))
boxplot(dades[,k]~P, main=paste("Boxplot of", names(dades)[k], "vs", nameP ), horizontal=TRUE)
barplot(tapply(dades[[k]], P, mean),main=paste("Means of", names(dades)[k], "by", nameP ))
abline(h=mean(dades[[k]]))
legend(0,mean(dades[[k]]),"global mean",bty="n")
print("Statistics by groups:")
for(s in levels(as.factor(P))) {print(summary(dades[P==s,k]))}
o<-oneway.test(dades[,k]~P)
print(paste("p-valueANOVA:", o$p.value))
kw<-kruskal.test(dades[,k]~P)
print(paste("p-value Kruskal-Wallis:", kw$p.value))
pvalk[,k]<-ValorTestXnum(dades[,k], P)
print("p-values ValorsTest: ")
print(pvalk[,k])
}else{
if(class(dd[,k])=="Date"){
print(summary(dd[,k]))
print(sd(dd[,k]))
#decide breaks: weeks, months, quarters...
hist(dd[,k],breaks="weeks")
}else{
#qualitatives
print(paste("Variable", names(dades)[k]))
table<-table(P,dades[,k])
# print("Cross-table")
# print(table)
rowperc<-prop.table(table,1)
colperc<-prop.table(table,2)
# print("Distribucions condicionades a files")
# print(rowperc)
#ojo porque si la variable es true o false la identifica amb el tipus Logical i
#aquest no te levels, por tanto, coertion preventiva
dades[,k]<-as.factor(dades[,k])
marg <- table(as.factor(P))/n
print(append("Categories=",levels(as.factor(dades[,k]))))
#from next plots, select one of them according to your practical case
#with legend
plot(marg,type="l",ylim=c(0,1),main=paste("Prop. of major & minor by",names(dades)[k]))
paleta<-rainbow(length(levels(dades[,k])))
for(c in 1:length(levels(dades[,k]))){lines(colperc[,c],col=paleta[c]) }
legend("topright", levels(dades[,k]), col=paleta, lty=2, cex=0.6)
#condicionades a classes
#with legend
plot(marg,type="n",ylim=c(0,1),main=paste("Prop. of major & minor by",names(dades)[k]))
paleta<-rainbow(length(levels(dades[,k])))
for(c in 1:length(levels(dades[,k]))){lines(rowperc[,c],col=paleta[c]) }
legend("topright", levels(dades[,k]), col=paleta, lty=2, cex=0.6)
#amb variable en eix d'abcisses
marg <-table(dades[,k])/n
print(append("Categories=",levels(dades[,k])))
#with legend
plot(marg,type="l",ylim=c(0,1),main=paste("Prop. of major & minor by",names(dades)[k]), las=3)
for(c in 1:length(levels(as.factor(P)))){lines(rowperc[c,],col=paleta[c])}
legend("topright", levels(as.factor(P)), col=paleta, lty=2, cex=0.6)
#condicionades a columna
#with legend
plot(marg,type="n",ylim=c(0,1),main=paste("Prop. of major & minor by",names(dades)[k]), las=3)
for(c in 1:length(levels(as.factor(P)))){lines(colperc[c,],col=paleta[c])}
legend("topright", levels(as.factor(P)), col=paleta, lty=2, cex=0.6)
table<-table(dades[,k],P)
print("Cross Table:")
print(table)
print("Distribucions condicionades a columnes:")
print(colperc)
#diagrames de barres apilades
paleta<-rainbow(length(levels(dades[,k])))
barplot(table(dades[,k], as.factor(P)), beside=FALSE,col=paleta )
legend("topright",levels(as.factor(dades[,k])),pch=1,cex=0.5, col=paleta)
#diagrames de barres adosades
barplot(table(dades[,k], as.factor(P)), beside=TRUE,col=paleta)
legend("topright",levels(as.factor(dades[,k])),pch=1,cex=0.5, col=paleta)
print("Test Chi quadrat: ")
print(chisq.test(dades[,k], as.factor(P)))
print("valorsTest:")
print( ValorTestXquali(P,dades[,k]))
#calcular els pvalues de les quali
}
}
}#endfor
[1] "Analysis by class of the Variable: popularity"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 20.00 31.00 33.29 46.00 93.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 17.00 29.00 31.73 46.00 97.00
[1] "p-valueANOVA: 0.0400221586551545"
[1] "p-value Kruskal-Wallis: 0.0120068574469301"
[1] "p-values ValorsTest: "
[1] 0.01756552 0.01756552
[1] "Analysis by class of the Variable: duration_ms"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
28946 164319 213993 220668 265283 594533
Min. 1st Qu. Median Mean 3rd Qu. Max.
30622 171994 221756 231206 277523 579546
[1] "p-valueANOVA: 0.00239712152994276"
[1] "p-value Kruskal-Wallis: 0.00437920349247436"
[1] "p-values ValorsTest: "
[1] 0.0009370723 0.0009370723
[1] "Variable explicit"
[1] "Categories=" "False" "True"
[1] "Categories=" "False" "True"
[1] "Cross Table:"
P
major minor
False 1899 913
True 123 65
[1] "Distribucions condicionades a columnes:"
P False True
major 0.6753201 0.6542553
minor 0.3246799 0.3457447
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 0.26645, df = 1, p-value = 0.6057
[1] "valorsTest:"
$rowpf
Xquali
P False True
major 0.93916914 0.06083086
minor 0.93353783 0.06646217
$vtest
Xquali
P False True
major 0.5965451 -0.5965451
minor -0.5965451 0.5965451
$pval
Xquali
P False True
major 0.2754056 0.2754056
minor 0.2754056 0.2754056
[1] "Analysis by class of the Variable: danceability"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.4290 0.5460 0.5412 0.6637 0.9750
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.4243 0.5570 0.5405 0.6690 0.9530
[1] "p-valueANOVA: 0.919396530566418"
[1] "p-value Kruskal-Wallis: 0.542364091950772"
[1] "p-values ValorsTest: "
[1] 0.4592491 0.4592491
[1] "Analysis by class of the Variable: energy"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000242 0.401250 0.635000 0.608990 0.851000 0.999000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00983 0.48900 0.71100 0.65809 0.87475 0.99900
[1] "p-valueANOVA: 1.42880872470579e-06"
[1] "p-value Kruskal-Wallis: 3.11239888728683e-06"
[1] "p-values ValorsTest: "
[1] 1.116477e-06 1.116477e-06
[1] "Variable key"
[1] "Categories=" "A" "A#/Bb" "B" "C" "C#/Db" "D"
[8] "D#/Eb" "E" "F" "F#/Gb" "G" "G#/Ab"
[1] "Categories=" "A" "A#/Bb" "B" "C" "C#/Db" "D"
[8] "D#/Eb" "E" "F" "F#/Gb" "G" "G#/Ab"
[1] "Cross Table:"
P
major minor
A 202 124
A#/Bb 106 81
B 102 119
C 291 75
C#/Db 170 80
D 266 80
D#/Eb 66 24
E 149 120
F 151 101
F#/Gb 87 73
G 315 68
G#/Ab 117 33
[1] "Distribucions condicionades a columnes:"
P A A#/Bb B C C#/Db D D#/Eb E F F#/Gb
major 0.6196319 0.5668449 0.4615385 0.7950820 0.6800000 0.7687861 0.7333333 0.5539033 0.5992063 0.5437500
minor 0.3803681 0.4331551 0.5384615 0.2049180 0.3200000 0.2312139 0.2666667 0.4460967 0.4007937 0.4562500
P G G#/Ab
major 0.8224543 0.7800000
minor 0.1775457 0.2200000
[1] "Test Chi quadrat: "
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 182.12, df = 11, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P A A#/Bb B C C#/Db D D#/Eb E F
major 0.09990109 0.05242334 0.05044510 0.14391691 0.08407517 0.13155292 0.03264095 0.07368942 0.07467854
minor 0.12678937 0.08282209 0.12167689 0.07668712 0.08179959 0.08179959 0.02453988 0.12269939 0.10327198
Xquali
P F#/Gb G G#/Ab
major 0.04302671 0.15578635 0.05786350
minor 0.07464213 0.06952965 0.03374233
$vtest
Xquali
P A A#/Bb B C C#/Db D D#/Eb E F
major -2.2181664 -3.2282754 -7.0009031 5.2739257 0.2113863 3.9990262 1.2192574 -4.4042106 -2.6465403
minor 2.2181664 3.2282754 7.0009031 -5.2739257 -0.2113863 -3.9990262 -1.2192574 4.4042106 2.6465403
Xquali
P F#/Gb G G#/Ab
major -3.6124383 6.6360891 2.8415215
minor 3.6124383 -6.6360891 -2.8415215
$pval
Xquali
P A A#/Bb B C C#/Db D D#/Eb
major 1.327175e-02 6.226950e-04 1.271538e-12 6.676800e-08 4.162929e-01 3.180182e-05 1.113733e-01
minor 1.327175e-02 6.226950e-04 1.271589e-12 6.676800e-08 4.162929e-01 3.180182e-05 1.113733e-01
Xquali
P E F F#/Gb G G#/Ab
major 5.308488e-06 4.065990e-03 1.516657e-04 1.610576e-11 2.244941e-03
minor 5.308488e-06 4.065990e-03 1.516657e-04 1.610578e-11 2.244941e-03
[1] "Analysis by class of the Variable: loudness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
-37.417 -11.354 -7.519 -8.891 -5.186 0.377
Min. 1st Qu. Median Mean 3rd Qu. Max.
-42.631 -10.707 -7.438 -8.883 -5.177 -1.134
[1] "p-valueANOVA: 0.971118589129941"
[1] "p-value Kruskal-Wallis: 0.592448123464342"
[1] "p-values ValorsTest: "
[1] 0.4853843 0.4853843
[1] "Variable mode"
[1] "Categories=" "major" "minor"
[1] "Categories=" "major" "minor"
[1] "Cross Table:"
P
major minor
major 2022 0
minor 0 978
[1] "Distribucions condicionades a columnes:"
P major minor
major 1 0
minor 0 1
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 2995.5, df = 1, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P major minor
major 1 0
minor 0 1
$vtest
Xquali
P major minor
major 54.77226 -54.77226
minor -54.77226 54.77226
$pval
Xquali
P major minor
major 0 0
minor 0 0
[1] "Analysis by class of the Variable: speechiness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03380 0.04490 0.07438 0.07130 0.96200
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.03712 0.05025 0.08738 0.08165 0.95500
[1] "p-valueANOVA: 0.00233148377089234"
[1] "p-value Kruskal-Wallis: 6.65362573251419e-08"
[1] "p-values ValorsTest: "
[1] 0.0006716785 0.0006716785
[1] "Analysis by class of the Variable: acousticness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000001 0.018950 0.270000 0.368455 0.703750 0.996000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000011 0.0055125 0.1380000 0.3071756 0.6167500 0.9960000
[1] "p-valueANOVA: 7.06558400528732e-06"
[1] "p-value Kruskal-Wallis: 1.31911871199142e-06"
[1] "p-values ValorsTest: "
[1] 3.584032e-06 3.584032e-06
[1] "Analysis by class of the Variable: instrumentalness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000328 0.1591862 0.0473750 1.0000000
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0008475 0.2497380 0.6480000 1.0000000
[1] "p-valueANOVA: 9.57965588130597e-11"
[1] "p-value Kruskal-Wallis: 4.72350468857676e-13"
[1] "p-values ValorsTest: "
[1] 3.990808e-12 3.990828e-12
[1] "Analysis by class of the Variable: liveness"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0204 0.1020 0.1425 0.2395 0.3068 0.9920
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0112 0.0991 0.1385 0.2346 0.3068 0.9890
[1] "p-valueANOVA: 0.557457582036754"
[1] "p-value Kruskal-Wallis: 0.4249753015773"
[1] "p-values ValorsTest: "
[1] 0.2796026 0.2796026
[1] "Analysis by class of the Variable: valence"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2590 0.4880 0.4860 0.7007 0.9830
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2190 0.4470 0.4588 0.6797 0.9850
[1] "p-valueANOVA: 0.00926221933990095"
[1] "p-value Kruskal-Wallis: 0.0079740097954662"
[1] "p-values ValorsTest: "
[1] 0.004419935 0.004419935
[1] "Analysis by class of the Variable: tempo"
[1] "Statistics by groups:"
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 100.7 122.4 122.9 140.0 209.1
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 98.19 120.12 119.87 137.69 219.97
[1] "p-valueANOVA: 0.00884088440664359"
[1] "p-value Kruskal-Wallis: 0.00992439951138771"
[1] "p-values ValorsTest: "
[1] 0.004591509 0.004591509
[1] "Variable time_signature"
[1] "Categories=" "0" "1" "3" "4" "5"
[1] "Categories=" "0" "1" "3" "4" "5"
[1] "Cross Table:"
P
major minor
0 2 2
1 21 9
3 187 84
4 1772 864
5 40 19
[1] "Distribucions condicionades a columnes:"
P 0 1 3 4 5
major 0.5000000 0.7000000 0.6900369 0.6722307 0.6779661
minor 0.5000000 0.3000000 0.3099631 0.3277693 0.3220339
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 1.0024, df = 4, p-value = 0.9094
[1] "valorsTest:"
$rowpf
Xquali
P 0 1 3 4 5
major 0.0009891197 0.0103857567 0.0924826904 0.8763600396 0.0197823937
minor 0.0020449898 0.0092024540 0.0858895706 0.8834355828 0.0194274029
$vtest
Xquali
P 0 1 3 4 5
major -0.74289976 0.30533573 0.59050719 -0.55636144 0.06563934
minor 0.74289976 -0.30533573 -0.59050719 0.55636144 -0.06563934
$pval
Xquali
P 0 1 3 4 5
major 0.2287712 0.3800552 0.2774253 0.2889819 0.4738325
minor 0.2287712 0.3800552 0.2774253 0.2889819 0.4738325
[1] "Variable track_genre"
[1] "Categories=" "acoustic" "afrobeat" "alt-rock" "alternative"
[6] "ambient" "anime" "black-metal" "bluegrass" "blues"
[11] "brazil" "breakbeat" "british" "cantopop" "chicago-house"
[16] "children" "chill" "classical" "club" "comedy"
[21] "country" "dance" "dancehall" "death-metal" "detroit-techno"
[26] "disco" "disney" "drum-and-bass" "edm" "electro"
[31] "electronic" "emo" "folk" "forro" "garage"
[36] "german" "gospel" "goth" "grindcore" "groove"
[41] "grunge" "guitar" "happy" "hard-rock" "hardcore"
[46] "heavy-metal" "hip-hop" "honky-tonk" "idm" "indian"
[51] "indie" "indie-pop" "industrial" "iranian" "j-dance"
[56] "j-idol" "j-pop" "j-rock" "jazz" "k-pop"
[61] "kids" "latin" "latino" "malay" "mandopop"
[66] "metal" "metalcore" "minimal-techno" "mpb" "new-age"
[71] "opera" "pagode" "party" "piano" "pop"
[76] "pop-film" "power-pop" "progressive-house" "psych-rock" "punk"
[81] "punk-rock" "r-n-b" "rock-n-roll" "rockabilly" "romance"
[86] "salsa" "samba" "sertanejo" "show-tunes" "singer-songwriter"
[91] "ska" "sleep" "spanish" "study" "swedish"
[96] "synth-pop" "tango" "trance" "trip-hop" "turkish"
[101] "world-music"
[1] "Categories=" "acoustic" "afrobeat" "alt-rock" "alternative"
[6] "ambient" "anime" "black-metal" "bluegrass" "blues"
[11] "brazil" "breakbeat" "british" "cantopop" "chicago-house"
[16] "children" "chill" "classical" "club" "comedy"
[21] "country" "dance" "dancehall" "death-metal" "detroit-techno"
[26] "disco" "disney" "drum-and-bass" "edm" "electro"
[31] "electronic" "emo" "folk" "forro" "garage"
[36] "german" "gospel" "goth" "grindcore" "groove"
[41] "grunge" "guitar" "happy" "hard-rock" "hardcore"
[46] "heavy-metal" "hip-hop" "honky-tonk" "idm" "indian"
[51] "indie" "indie-pop" "industrial" "iranian" "j-dance"
[56] "j-idol" "j-pop" "j-rock" "jazz" "k-pop"
[61] "kids" "latin" "latino" "malay" "mandopop"
[66] "metal" "metalcore" "minimal-techno" "mpb" "new-age"
[71] "opera" "pagode" "party" "piano" "pop"
[76] "pop-film" "power-pop" "progressive-house" "psych-rock" "punk"
[81] "punk-rock" "r-n-b" "rock-n-roll" "rockabilly" "romance"
[86] "salsa" "samba" "sertanejo" "show-tunes" "singer-songwriter"
[91] "ska" "sleep" "spanish" "study" "swedish"
[96] "synth-pop" "tango" "trance" "trip-hop" "turkish"
[101] "world-music"
[1] "Cross Table:"
P
major minor
acoustic 19 4
afrobeat 25 16
alt-rock 27 19
alternative 4 4
ambient 35 6
anime 12 20
black-metal 8 9
bluegrass 33 8
blues 2 0
brazil 15 4
breakbeat 22 14
british 51 20
cantopop 41 10
chicago-house 32 35
children 84 14
chill 6 0
classical 18 4
club 21 3
comedy 21 11
country 5 0
dance 2 0
dancehall 5 9
death-metal 12 9
detroit-techno 44 35
disco 3 1
disney 33 10
drum-and-bass 2 2
edm 2 1
electro 8 3
electronic 9 10
emo 20 10
folk 11 7
forro 37 13
garage 25 8
german 14 10
gospel 12 2
goth 32 15
grindcore 24 25
groove 26 8
grunge 28 16
guitar 21 10
happy 19 11
hard-rock 33 12
hardcore 6 6
heavy-metal 63 40
hip-hop 3 5
honky-tonk 106 3
idm 30 31
indian 8 1
indie 1 0
indie-pop 2 0
industrial 30 17
iranian 24 20
j-dance 14 12
j-idol 79 31
j-pop 10 2
j-rock 11 5
jazz 1 1
k-pop 19 18
kids 64 20
latin 0 1
latino 1 2
malay 22 14
mandopop 21 2
metal 10 4
metalcore 13 17
minimal-techno 5 1
mpb 10 6
new-age 47 16
opera 14 6
pagode 39 20
party 15 5
piano 22 6
pop 1 0
pop-film 2 1
power-pop 24 4
progressive-house 2 2
psych-rock 26 11
punk 4 1
punk-rock 13 4
r-n-b 11 0
rock-n-roll 28 7
rockabilly 22 7
romance 19 34
salsa 13 8
samba 20 4
sertanejo 36 8
show-tunes 16 2
singer-songwriter 8 0
ska 27 13
sleep 16 18
spanish 4 7
study 35 39
swedish 5 2
synth-pop 5 13
tango 25 20
trance 3 7
trip-hop 11 16
turkish 2 5
world-music 51 5
[1] "Distribucions condicionades a columnes:"
P acoustic afrobeat alt-rock alternative ambient anime black-metal bluegrass blues
major 0.82608696 0.60975610 0.58695652 0.50000000 0.85365854 0.37500000 0.47058824 0.80487805 1.00000000
minor 0.17391304 0.39024390 0.41304348 0.50000000 0.14634146 0.62500000 0.52941176 0.19512195 0.00000000
P brazil breakbeat british cantopop chicago-house children chill classical club
major 0.78947368 0.61111111 0.71830986 0.80392157 0.47761194 0.85714286 1.00000000 0.81818182 0.87500000
minor 0.21052632 0.38888889 0.28169014 0.19607843 0.52238806 0.14285714 0.00000000 0.18181818 0.12500000
P comedy country dance dancehall death-metal detroit-techno disco disney
major 0.65625000 1.00000000 1.00000000 0.35714286 0.57142857 0.55696203 0.75000000 0.76744186
minor 0.34375000 0.00000000 0.00000000 0.64285714 0.42857143 0.44303797 0.25000000 0.23255814
P drum-and-bass edm electro electronic emo folk forro garage german
major 0.50000000 0.66666667 0.72727273 0.47368421 0.66666667 0.61111111 0.74000000 0.75757576 0.58333333
minor 0.50000000 0.33333333 0.27272727 0.52631579 0.33333333 0.38888889 0.26000000 0.24242424 0.41666667
P gospel goth grindcore groove grunge guitar happy hard-rock hardcore
major 0.85714286 0.68085106 0.48979592 0.76470588 0.63636364 0.67741935 0.63333333 0.73333333 0.50000000
minor 0.14285714 0.31914894 0.51020408 0.23529412 0.36363636 0.32258065 0.36666667 0.26666667 0.50000000
P heavy-metal hip-hop honky-tonk idm indian indie indie-pop industrial iranian
major 0.61165049 0.37500000 0.97247706 0.49180328 0.88888889 1.00000000 1.00000000 0.63829787 0.54545455
minor 0.38834951 0.62500000 0.02752294 0.50819672 0.11111111 0.00000000 0.00000000 0.36170213 0.45454545
P j-dance j-idol j-pop j-rock jazz k-pop kids latin latino
major 0.53846154 0.71818182 0.83333333 0.68750000 0.50000000 0.51351351 0.76190476 0.00000000 0.33333333
minor 0.46153846 0.28181818 0.16666667 0.31250000 0.50000000 0.48648649 0.23809524 1.00000000 0.66666667
P malay mandopop metal metalcore minimal-techno mpb new-age opera pagode
major 0.61111111 0.91304348 0.71428571 0.43333333 0.83333333 0.62500000 0.74603175 0.70000000 0.66101695
minor 0.38888889 0.08695652 0.28571429 0.56666667 0.16666667 0.37500000 0.25396825 0.30000000 0.33898305
P party piano pop pop-film power-pop progressive-house psych-rock punk
major 0.75000000 0.78571429 1.00000000 0.66666667 0.85714286 0.50000000 0.70270270 0.80000000
minor 0.25000000 0.21428571 0.00000000 0.33333333 0.14285714 0.50000000 0.29729730 0.20000000
P punk-rock r-n-b rock-n-roll rockabilly romance salsa samba sertanejo show-tunes
major 0.76470588 1.00000000 0.80000000 0.75862069 0.35849057 0.61904762 0.83333333 0.81818182 0.88888889
minor 0.23529412 0.00000000 0.20000000 0.24137931 0.64150943 0.38095238 0.16666667 0.18181818 0.11111111
P singer-songwriter ska sleep spanish study swedish synth-pop tango
major 1.00000000 0.67500000 0.47058824 0.36363636 0.47297297 0.71428571 0.27777778 0.55555556
minor 0.00000000 0.32500000 0.52941176 0.63636364 0.52702703 0.28571429 0.72222222 0.44444444
P trance trip-hop turkish world-music
major 0.30000000 0.40740741 0.28571429 0.91071429
minor 0.70000000 0.59259259 0.71428571 0.08928571
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 342.97, df = 99, p-value < 2.2e-16
[1] "valorsTest:"
$rowpf
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal
major 0.0093966370 0.0123639960 0.0133531157 0.0019782394 0.0173095945 0.0059347181 0.0039564787
minor 0.0040899796 0.0163599182 0.0194274029 0.0040899796 0.0061349693 0.0204498978 0.0092024540
Xquali
P bluegrass blues brazil breakbeat british cantopop chicago-house
major 0.0163204748 0.0009891197 0.0074183976 0.0108803165 0.0252225519 0.0202769535 0.0158259149
minor 0.0081799591 0.0000000000 0.0040899796 0.0143149284 0.0204498978 0.0102249489 0.0357873211
Xquali
P children chill classical club comedy country dance
major 0.0415430267 0.0029673591 0.0089020772 0.0103857567 0.0103857567 0.0024727992 0.0009891197
minor 0.0143149284 0.0000000000 0.0040899796 0.0030674847 0.0112474438 0.0000000000 0.0000000000
Xquali
P dancehall death-metal detroit-techno disco disney drum-and-bass edm
major 0.0024727992 0.0059347181 0.0217606330 0.0014836795 0.0163204748 0.0009891197 0.0009891197
minor 0.0092024540 0.0092024540 0.0357873211 0.0010224949 0.0102249489 0.0020449898 0.0010224949
Xquali
P electro electronic emo folk forro garage german
major 0.0039564787 0.0044510386 0.0098911968 0.0054401583 0.0182987141 0.0123639960 0.0069238378
minor 0.0030674847 0.0102249489 0.0102249489 0.0071574642 0.0132924335 0.0081799591 0.0102249489
Xquali
P gospel goth grindcore groove grunge guitar happy
major 0.0059347181 0.0158259149 0.0118694362 0.0128585559 0.0138476756 0.0103857567 0.0093966370
minor 0.0020449898 0.0153374233 0.0255623722 0.0081799591 0.0163599182 0.0102249489 0.0112474438
Xquali
P hard-rock hardcore heavy-metal hip-hop honky-tonk idm indian
major 0.0163204748 0.0029673591 0.0311572700 0.0014836795 0.0524233432 0.0148367953 0.0039564787
minor 0.0122699387 0.0061349693 0.0408997955 0.0051124744 0.0030674847 0.0316973415 0.0010224949
Xquali
P indie indie-pop industrial iranian j-dance j-idol j-pop
major 0.0004945598 0.0009891197 0.0148367953 0.0118694362 0.0069238378 0.0390702275 0.0049455984
minor 0.0000000000 0.0000000000 0.0173824131 0.0204498978 0.0122699387 0.0316973415 0.0020449898
Xquali
P j-rock jazz k-pop kids latin latino malay
major 0.0054401583 0.0004945598 0.0093966370 0.0316518299 0.0000000000 0.0004945598 0.0108803165
minor 0.0051124744 0.0010224949 0.0184049080 0.0204498978 0.0010224949 0.0020449898 0.0143149284
Xquali
P mandopop metal metalcore minimal-techno mpb new-age opera
major 0.0103857567 0.0049455984 0.0064292779 0.0024727992 0.0049455984 0.0232443126 0.0069238378
minor 0.0020449898 0.0040899796 0.0173824131 0.0010224949 0.0061349693 0.0163599182 0.0061349693
Xquali
P pagode party piano pop pop-film power-pop progressive-house
major 0.0192878338 0.0074183976 0.0108803165 0.0004945598 0.0009891197 0.0118694362 0.0009891197
minor 0.0204498978 0.0051124744 0.0061349693 0.0000000000 0.0010224949 0.0040899796 0.0020449898
Xquali
P psych-rock punk punk-rock r-n-b rock-n-roll rockabilly romance
major 0.0128585559 0.0019782394 0.0064292779 0.0054401583 0.0138476756 0.0108803165 0.0093966370
minor 0.0112474438 0.0010224949 0.0040899796 0.0000000000 0.0071574642 0.0071574642 0.0347648262
Xquali
P salsa samba sertanejo show-tunes singer-songwriter ska sleep
major 0.0064292779 0.0098911968 0.0178041543 0.0079129575 0.0039564787 0.0133531157 0.0079129575
minor 0.0081799591 0.0040899796 0.0081799591 0.0020449898 0.0000000000 0.0132924335 0.0184049080
Xquali
P spanish study swedish synth-pop tango trance trip-hop
major 0.0019782394 0.0173095945 0.0024727992 0.0024727992 0.0123639960 0.0014836795 0.0054401583
minor 0.0071574642 0.0398773006 0.0020449898 0.0132924335 0.0204498978 0.0071574642 0.0163599182
Xquali
P turkish world-music
major 0.0009891197 0.0252225519
minor 0.0051124744 0.0051124744
$vtest
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal bluegrass
major 1.56202633 -0.88363567 -1.26920504 -1.05132097 2.47109352 -3.62773962 -1.79430299 1.80014769
minor -1.56202633 0.88363567 1.26920504 1.05132097 -2.47109352 3.62773962 1.79430299 -1.80014769
Xquali
P blues brazil breakbeat british cantopop chicago-house children chill
major 0.98387214 1.07721084 -0.80985627 0.80610523 1.99641506 -3.46831336 3.93256761 1.70525451
minor -0.98387214 -1.07721084 0.80985627 -0.80610523 -1.99641506 3.46831336 -3.93256761 -1.70525451
Xquali
P classical club comedy country dance dancehall death-metal detroit-techno
major 1.44804271 2.10914819 -0.21535912 1.55641737 0.98387214 -2.53515488 -1.00628889 -2.24903619
minor -1.44804271 -2.10914819 0.21535912 -1.55641737 -0.98387214 2.53515488 1.00628889 2.24903619
Xquali
P disco disney drum-and-bass edm electro electronic emo folk
major 0.32448495 1.31665478 -0.74289976 -0.02711069 0.37762453 -1.86867112 -0.08612033 -0.57092391
minor -0.32448495 -1.31665478 0.74289976 0.02711069 -0.37762453 1.86867112 0.08612033 0.57092391
Xquali
P forro garage german gospel goth grindcore groove grunge
major 1.00401409 1.02991266 -0.95139023 1.46531495 0.10099435 -2.77354083 1.13477881 -0.53654192
minor -1.00401409 -1.02991266 0.95139023 -1.46531495 -0.10099435 2.77354083 -1.13477881 0.53654192
Xquali
P guitar happy hard-rock hardcore heavy-metal hip-hop honky-tonk idm
major 0.04082647 -0.47757640 0.85555541 -1.28846152 -1.37372348 -1.80658028 6.77207948 -3.06709735
minor -0.04082647 0.47757640 -0.85555541 1.28846152 1.37372348 1.80658028 -6.77207948 3.06709735
Xquali
P indian indie indie-pop industrial iranian j-dance j-idol j-pop
major 1.37736451 0.69558666 0.98387214 -0.52629977 -1.83253689 -1.48081438 1.00719343 1.17985557
minor -1.37736451 -0.69558666 -0.98387214 0.52629977 1.83253689 1.48081438 -1.00719343 -1.17985557
Xquali
P j-rock jazz k-pop kids latin latino malay mandopop
major 0.11550911 -0.52513421 -2.09553725 1.74333225 -1.43811476 -1.25941475 -0.80985627 2.45512314
minor -0.11550911 0.52513421 2.09553725 -1.74333225 1.43811476 1.25941475 0.80985627 -2.45512314
Xquali
P metal metalcore minimal-techno mpb new-age opera pagode party
major 0.32232357 -2.82631279 0.83344750 -0.41925528 1.23271909 0.24888693 -0.21487067 0.72751565
minor -0.32232357 2.82631279 -0.83344750 0.41925528 -1.23271909 -0.24888693 0.21487067 -0.72751565
Xquali
P piano pop pop-film power-pop progressive-house psych-rock punk punk-rock
major 1.26702506 0.69558666 -0.02711069 2.07714339 -0.74289976 0.37478285 0.60156009 0.80012007
minor -1.26702506 -0.69558666 0.02711069 -2.07714339 0.74289976 -0.37478285 -0.60156009 -0.80012007
Xquali
P r-n-b rock-n-roll rockabilly romance salsa samba sertanejo show-tunes
major 2.31085590 1.59960995 0.97689101 -4.94404136 -0.53911670 1.67192841 2.05544803 1.95082482
minor -2.31085590 -1.59960995 -0.97689101 4.94404136 0.53911670 -1.67192841 -2.05544803 -1.95082482
Xquali
P singer-songwriter ska sleep spanish study swedish synth-pop tango
major 1.96971630 0.01358332 -2.54478931 -2.20001730 -3.73555408 0.22765050 -3.59702239 -1.70790649
minor -1.96971630 -0.01358332 2.54478931 2.20001730 3.73555408 -0.22765050 3.59702239 1.70790649
Xquali
P trance trip-hop turkish world-music
major -2.52730634 -2.96861847 -2.19416333 3.81479723
minor 2.52730634 2.96861847 2.19416333 -3.81479723
$pval
Xquali
P acoustic afrobeat alt-rock alternative ambient anime black-metal
major 5.914089e-02 1.884465e-01 1.021840e-01 1.465556e-01 6.735029e-03 1.429567e-04 3.638241e-02
minor 5.914089e-02 1.884465e-01 1.021840e-01 1.465556e-01 6.735029e-03 1.429567e-04 3.638241e-02
Xquali
P bluegrass blues brazil breakbeat british cantopop chicago-house
major 3.591866e-02 1.625892e-01 1.406930e-01 2.090114e-01 2.100911e-01 2.294438e-02 2.618681e-04
minor 3.591866e-02 1.625892e-01 1.406930e-01 2.090114e-01 2.100911e-01 2.294438e-02 2.618681e-04
Xquali
P children chill classical club comedy country dance
major 4.202167e-05 4.407348e-02 7.380255e-02 1.746590e-02 4.147437e-01 5.980444e-02 1.625892e-01
minor 4.202167e-05 4.407348e-02 7.380255e-02 1.746590e-02 4.147437e-01 5.980444e-02 1.625892e-01
Xquali
P dancehall death-metal detroit-techno disco disney drum-and-bass edm
major 5.619881e-03 1.571383e-01 1.225510e-02 3.727855e-01 9.397718e-02 2.287712e-01 4.891857e-01
minor 5.619881e-03 1.571383e-01 1.225510e-02 3.727855e-01 9.397718e-02 2.287712e-01 4.891857e-01
Xquali
P electro electronic emo folk forro garage german
major 3.528548e-01 3.083429e-02 4.656854e-01 2.840256e-01 1.576859e-01 1.515255e-01 1.707032e-01
minor 3.528548e-01 3.083429e-02 4.656854e-01 2.840256e-01 1.576859e-01 1.515255e-01 1.707032e-01
Xquali
P gospel goth grindcore groove grunge guitar happy
major 7.141750e-02 4.597775e-01 2.772494e-03 1.282340e-01 2.957920e-01 4.837171e-01 3.164759e-01
minor 7.141750e-02 4.597775e-01 2.772494e-03 1.282340e-01 2.957920e-01 4.837171e-01 3.164759e-01
Xquali
P hard-rock hardcore heavy-metal hip-hop honky-tonk idm indian
major 1.961219e-01 9.879268e-02 8.476377e-02 3.541387e-02 6.347218e-12 1.080742e-03 8.419979e-02
minor 1.961219e-01 9.879268e-02 8.476377e-02 3.541387e-02 6.347256e-12 1.080742e-03 8.419979e-02
Xquali
P indie indie-pop industrial iranian j-dance j-idol j-pop
major 2.433439e-01 1.625892e-01 2.993400e-01 3.343574e-02 6.932802e-02 1.569209e-01 1.190288e-01
minor 2.433439e-01 1.625892e-01 2.993400e-01 3.343574e-02 6.932802e-02 1.569209e-01 1.190288e-01
Xquali
P j-rock jazz k-pop kids latin latino malay
major 4.540208e-01 2.997449e-01 1.806163e-02 4.063780e-02 7.520075e-02 1.039403e-01 2.090114e-01
minor 4.540208e-01 2.997449e-01 1.806163e-02 4.063780e-02 7.520075e-02 1.039403e-01 2.090114e-01
Xquali
P mandopop metal metalcore minimal-techno mpb new-age opera
major 7.041817e-03 3.736038e-01 2.354363e-03 2.022962e-01 3.375148e-01 1.088403e-01 4.017241e-01
minor 7.041817e-03 3.736038e-01 2.354363e-03 2.022962e-01 3.375148e-01 1.088403e-01 4.017241e-01
Xquali
P pagode party piano pop pop-film power-pop progressive-house
major 4.149341e-01 2.334551e-01 1.025732e-01 2.433439e-01 4.891857e-01 1.889416e-02 2.287712e-01
minor 4.149341e-01 2.334551e-01 1.025732e-01 2.433439e-01 4.891857e-01 1.889416e-02 2.287712e-01
Xquali
P psych-rock punk punk-rock r-n-b rock-n-roll rockabilly romance
major 3.539110e-01 2.737335e-01 2.118206e-01 1.042041e-02 5.484257e-02 1.643116e-01 3.825973e-07
minor 3.539110e-01 2.737335e-01 2.118206e-01 1.042041e-02 5.484257e-02 1.643116e-01 3.825973e-07
Xquali
P salsa samba sertanejo show-tunes singer-songwriter ska sleep
major 2.949032e-01 4.726922e-02 1.991788e-02 2.553894e-02 2.443545e-02 4.945812e-01 5.467185e-03
minor 2.949032e-01 4.726922e-02 1.991788e-02 2.553894e-02 2.443545e-02 4.945812e-01 5.467185e-03
Xquali
P spanish study swedish synth-pop tango trance trip-hop
major 1.390283e-02 9.365116e-05 4.099590e-01 1.609404e-04 4.382685e-02 5.747060e-03 1.495709e-03
minor 1.390283e-02 9.365116e-05 4.099590e-01 1.609404e-04 4.382685e-02 5.747060e-03 1.495709e-03
Xquali
P turkish world-music
major 1.411183e-02 6.814740e-05
minor 1.411183e-02 6.814740e-05
[1] "Variable multiple_artists"
[1] "Categories=" "FALSE" "TRUE"
[1] "Categories=" "FALSE" "TRUE"
[1] "Cross Table:"
P
major minor
FALSE 1965 962
TRUE 57 16
[1] "Distribucions condicionades a columnes:"
P FALSE TRUE
major 0.6713358 0.7808219
minor 0.3286642 0.2191781
[1] "Test Chi quadrat: "
Pearson's Chi-squared test with Yates' continuity correction
data: dades[, k] and as.factor(P)
X-squared = 3.4033, df = 1, p-value = 0.06506
[1] "valorsTest:"
$rowpf
Xquali
P FALSE TRUE
major 0.97181009 0.02818991
minor 0.98364008 0.01635992
$vtest
Xquali
P FALSE TRUE
major -1.971207 1.971207
minor 1.971207 -1.971207
$pval
Xquali
P FALSE TRUE
major 0.02435008 0.02435008
minor 0.02435008 0.02435008
[1] "Variable tempo_cat"
[1] "Categories=" "Larghissimo" "Grave" "Lento/Largo" "Larghetto" "Adagio" "Andante"
[8] "Moderato" "Allegro" "Vivace" "Presto" "Prestissimo"
[1] "Categories=" "Larghissimo" "Grave" "Lento/Largo" "Larghetto" "Adagio" "Andante"
[8] "Moderato" "Allegro" "Vivace" "Presto" "Prestissimo"
[1] "Cross Table:"
P
major minor
Larghissimo 0 0
Grave 0 0
Lento/Largo 9 1
Larghetto 13 10
Adagio 63 33
Andante 551 307
Moderato 301 121
Allegro 908 427
Vivace 75 44
Presto 86 30
Prestissimo 14 3
[1] "Distribucions condicionades a columnes:"
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace Presto
major 0.9000000 0.5652174 0.6562500 0.6421911 0.7132701 0.6801498 0.6302521 0.7413793
minor 0.1000000 0.4347826 0.3437500 0.3578089 0.2867299 0.3198502 0.3697479 0.2586207
P Prestissimo
major 0.8235294
minor 0.1764706
[1] "Test Chi quadrat: "
Warning: Chi-squared approximation may be incorrect
Pearson's Chi-squared test
data: dades[, k] and as.factor(P)
X-squared = 16.012, df = 8, p-value = 0.04221
[1] "valorsTest:"
$rowpf
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro
major 0.000000000 0.000000000 0.004455446 0.006435644 0.031188119 0.272772277 0.149009901 0.449504950
minor 0.000000000 0.000000000 0.001024590 0.010245902 0.033811475 0.314549180 0.123975410 0.437500000
Xquali
P Vivace Presto Prestissimo
major 0.037128713 0.042574257 0.006930693
minor 0.045081967 0.030737705 0.003073770
$vtest
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro Vivace
major 0.0000000 0.0000000 1.5259103 -1.1198620 -0.3821151 -2.3706107 1.8460768 0.6195929 -1.0446552
minor 0.0000000 0.0000000 -1.5259103 1.1198620 0.3821151 2.3706107 -1.8460768 -0.6195929 1.0446552
Xquali
P Presto Prestissimo
major 1.5738795 1.3172030
minor -1.5738795 -1.3172030
$pval
Xquali
P Larghissimo Grave Lento/Largo Larghetto Adagio Andante Moderato Allegro
major 0.500000000 0.500000000 0.063516101 0.131386282 0.351187992 0.008879362 0.032440526 0.267762926
minor 0.500000000 0.500000000 0.063516101 0.131386282 0.351187992 0.008879362 0.032440526 0.267762926
Xquali
P Vivace Presto Prestissimo
major 0.148091186 0.057757651 0.093885296
minor 0.148091186 0.057757651 0.093885296
#descriptors de les classes més significatius. Afegir info qualits
for (c in 1:length(levels(as.factor(P)))) {
if(!is.na(levels(as.factor(P))[c])){
print(paste("P.values per class:",levels(as.factor(P))[c]));
print(sort(pvalk[c,]), digits=3)
}
}
[1] "P.values per class: major"
explicit key mode time_signature track_genre multiple_artists
0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
tempo_cat instrumentalness energy acousticness speechiness duration_ms
0.00e+00 3.99e-12 1.12e-06 3.58e-06 6.72e-04 9.37e-04
valence tempo popularity liveness danceability loudness
4.42e-03 4.59e-03 1.76e-02 2.80e-01 4.59e-01 4.85e-01
[1] "P.values per class: minor"
explicit key mode time_signature track_genre multiple_artists
0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00
tempo_cat instrumentalness energy acousticness speechiness duration_ms
0.00e+00 3.99e-12 1.12e-06 3.58e-06 6.72e-04 9.37e-04
valence tempo popularity liveness danceability loudness
4.42e-03 4.59e-03 1.76e-02 2.80e-01 4.59e-01 4.85e-01
#afegir la informacio de les modalitats de les qualitatives a la llista de pvalues i fer ordenacio global
#saving the dataframe in an external file
#write.table(dd, file = "credscoClean.csv", sep = ";", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE)
Findings:
From the boxplot of valence vs mode, we can see that minor songs tend to have lower valence (sadder mood) than major songs. Similarly, songs in the minor key tend to have lower tempo compared to songs in the major key, although there are outliers.
Interestingly, many songs in the major key are in the key of G, C, D and A. Whereas the most popular key for minor songs are A, B and E.
For popularity, songs in the minor key have a wider range compared to major songs.
Genre vs valence.
library(dplyr)
library(forcats)
plotdata <- dd %>%
group_by(track_genre) %>%
summarize(mean_valence = mean(valence))
# plot mean salaries
ggplot(plotdata,
aes(x = fct_reorder(track_genre, mean_valence),
y = mean_valence)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90)) +
xlab("Genre") + ylab("Mean Valence")
One genre stood out when looking at highest valence: r&b. The Sleep genre has the lowest mean valence.
Next we can examine the relationship between genre and energy, by calculating the mean.
plotdata2 <- dd %>%
group_by(track_genre) %>%
summarize(mean_energy = mean(energy))
# plot mean salaries
ggplot(plotdata2,
aes(x = fct_reorder(track_genre, mean_energy),
y = mean_energy)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
Classical songs have the least mean energy, and drum-and-bass songs have the highest mean energy.
We can also examine the relationship between genre and danceability
plotdata3 <- dd %>%
group_by(track_genre) %>%
summarize(mean_danceability = mean(danceability))
# plot mean salaries
ggplot(plotdata3,
aes(x = fct_reorder(track_genre, mean_danceability),
y = mean_danceability)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
plotdata2 <- dd %>%
group_by(track_genre) %>%
summarize(mean_energy = mean(energy))
# plot mean salaries
ggplot(plotdata2,
aes(x = fct_reorder(track_genre, mean_energy),
y = mean_energy)) +
geom_bar(stat = "identity") +
scale_x_discrete(guide = guide_axis(angle = 90))
ggplot(dd,
aes(x = mode,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by mode")
Energy vs mode
ggplot(dd,
aes(x = mode,
y = energy)) +
geom_boxplot() +
labs(title = "Energy distribution by mode")
Minor songs actually have a higher range of energy compared to major songs, which is interesting because one would think that there would be more happy songs (typically major key) with higher energy. But this could be because many of the latin songs are in minor key.
ggplot(dd,
aes(x = mode,
y = acousticness)) +
geom_boxplot() +
labs(title = "Acousticness distribution by mode")
Looking at valence vs explicitness, we see that songs that are explicit tend to have lower valence than songs that are clean.
ggplot(dd,
aes(x = explicit,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by explicit") +
scale_x_discrete(guide = guide_axis(angle = 90))
Looking at valence vs explicitness, we see that songs that are explicit tend to have lower valence than songs that are clean.
ggplot(dd,
aes(x = explicit,
y = valence)) +
geom_boxplot() +
labs(title = "Valence distribution by explicit") +
scale_x_discrete(guide = guide_axis(angle = 90))
numerical_only <- dd %>% select(1:2, 4:5, 7, 9:14)
pairs(numerical_only)
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(numerical_only, lower.panel = panel.smooth, upper.panel = panel.cor,
gap=0, row1attop=FALSE)